이 GPU 개발자의 신조 기능적 완전성과 아키텍처의 분리가 순수 처리량보다 우선시되는 근본적인 철학을 수립합니다. ROCm 생태계에서 HIP는 대규모 동시성을 가능하게 하며, 이에 따라 모든 커널을 고위험의 격리된 블랙박스로 간주합니다.
1. 정확성의 우선성
HIP 개발에서는 통계적으로 일관되지 않은 '빠른' 결과는 실패입니다. 우리는 전체 ROCm 스택 에 대해 검증 가능한 수학적 정확성을 우선시합니다. 정확성이 없으면 성능은 의미가 없습니다.
2. 진단용 보호 장치로서의 격리
호스트 측 관리와 디바이스 측 실행 사이에 엄격한 격리를 강제하고, 전역 상태와 부작용을 최소화함으로써 비결정론적인 동시성 버그를 재현 가능한 논리 단위로 변환합니다.
3. 메모리/동시성의 필연성
우리는 메모리 손상과 경쟁 조건 GPU 성능의 주요 '포식자'라고 받아들입니다. HIP는 주요 저수준 프로그래밍 인터페이스입니다따라서 신조는 새로운 모든 커널에 대해 보수적인 동기화와 명시적인 메모리 소유권을 기본 기준으로 삼아야 한다고 규정합니다.
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
According to the Creed, what is a statistically inconsistent 'fast' result considered?
An acceptable trade-off for real-time systems.
A failure.
A 'heuristic' optimization.
A driver-level anomaly.
✅ Correct!
Correctness is the foundation; a fast but wrong answer is useless in scientific and production computing.❌ Incorrect
The creed explicitly states that speed without verifiable correctness is a failure.QUESTION 2
Why is 'Isolation' emphasized in the GPU development workflow?
To prevent the GPU from accessing host memory.
To reduce the electricity consumption of the ROCm stack.
To transform non-deterministic concurrency bugs into reproducible logical units.
To hide kernel source code from other developers.
✅ Correct!
Isolation allows you to debug specific units without the noise of global state or asynchronous race conditions.❌ Incorrect
Isolation is a diagnostic strategy to make bugs reproducible.QUESTION 3
In the 'Hierarchy of Needs' for GPU development, what forms the wide base?
Peak TFLOPS Tuning.
Functional Correctness (CPU Parity).
Shared Memory Optimization.
Inline Assembly.
✅ Correct!
CPU parity ensures the mathematical logic is sound before GPU-specific complexities are added.❌ Incorrect
Check the pyramid visual: Functional Correctness is the widest, most critical layer.QUESTION 4
What does 'Memory/Concurrency Fatalism' imply for a developer?
Assuming that memory will never fail.
Accepting that race conditions are the primary predators of performance.
Ignoring error codes from hipMalloc.
Assuming the compiler handles all synchronization.
✅ Correct!
Fatalism here means recognizing the inherent dangers of parallel memory access and planning for them from the start.❌ Incorrect
Fatalism means assuming these errors WILL happen unless specifically prevented.QUESTION 5
What is the recommended first step when implementing a complex kernel like an FFT?
Optimize shared memory usage immediately.
Use inline PTX assembly for speed.
Implement a strictly isolated version using global memory and explicit synchronization.
Disable all error checking to measure raw latency.
✅ Correct!
Verified global memory logic serves as the 'Gold Standard' before introducing complex shared memory tiling.❌ Incorrect
Jumping to shared memory shuffles before verifying the logic violates the Creed's correctness-first rule.Case Study: The 'Fast but Wrong' Wavefront
Debugging a 3D Stencil Kernel
A developer migrates a 3D Wavefront Reconstruction kernel to ROCm. To maximize speed, they use volatile shared memory and skip hipDeviceSynchronize() calls. The output is 100x faster than the CPU but 2% of the values are slightly off-target during high-load production runs.
Q
Based on the GPU Developer's Creed, what is the immediate priority for this developer?
Solution:
The priority is Functional Correctness. The developer must revert the optimizations (shared memory/async) and implement a strictly isolated version using global memory and explicit synchronization to find the 'Golden Model' discrepancy.
The priority is Functional Correctness. The developer must revert the optimizations (shared memory/async) and implement a strictly isolated version using global memory and explicit synchronization to find the 'Golden Model' discrepancy.
Q
Which layer of the Hierarchy of Needs did the developer skip?
Solution:
The developer skipped the base layer (Functional Correctness) and the middle layer (Isolation & Safety) to jump directly to the narrow tip (Performance Tuning).
The developer skipped the base layer (Functional Correctness) and the middle layer (Isolation & Safety) to jump directly to the narrow tip (Performance Tuning).
Q
How does 'Isolation' help solve the 2% error rate in this scenario?
Solution:
By isolating the kernel and comparing it bit-for-bit against a CPU reference, the developer can determine if the error is a logical math flaw or a non-deterministic race condition caused by shared memory concurrency.
By isolating the kernel and comparing it bit-for-bit against a CPU reference, the developer can determine if the error is a logical math flaw or a non-deterministic race condition caused by shared memory concurrency.